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Abstract. It is the purpose of the paper to describe the virtues of time-frequency methods for signal process- 
ing applications, having astronomical time series in mind. Different methods are considered and their potential 
usefulness respectively drawbacks are discussed and illustrated by examples. As areas where one can hope for 
a successful application of joint time-frequency analysis (JTFA), we describe specifically the problem of signal 
denoising as well as the question of signal separation which allows to separate signals (possibly overlapping in 
time or frequency, but) which are living on disjoint parts of the time-frequency plane. Some recipes for practical 
use of the algorithms are also provided. 
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1. Introduction 

The analysis of time series has always been an issue of 
central interest to astronomers. Indeed, the most natural 
way to get some insights on the characteristics of a phys- 
ical system consists in studying its behavior over time. 
Unfortunately, maybe because until few years ago the 
most interesting objects known to change their luminos- 
ity were periodic/semiperiodic stars and multiple star sys- 
tems, time series analysis became synonymous of research 
of periodicities and power-spectrum became the tool for 
signal analysis. Only in the eighties people began to realize 
that various astrophysical objects, principally extragalac- 
tic sources, can show very complex and unpredictable time 



evolutions (see, for example, Miller & Wiita 1991; Duschl 



et al. 1991). In spite of that, although inadequate, power- 



spectrum continued to be the preferred approach by most 
of researchers. Such a predilection can be explained with 
the substantial failure of the methods proposed as alter- 
natives ( |Vio ct al. 1992D . The main limitations of such 
methods are two: 

- The implicit assumption of stationarity of the pro- 
cesses underlying the signals. In other words, an ob- 
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served signal is assumed to be characterized by statis- 
tical properties that do not change with time. However, 
real signals in their vast majority are nonstationary. 
- The implicit assumption that it is possible to obtain 
the equations describing the evolution of a given sys- 
tem solely through the analysis of an experimental sig- 
nal. In reality, if one takes into account that a given 
time series is only the one-dimensional output of a sys- 
tem which very often can be described only in terms of 
a set of ordinary and/or partial differential equations 
(in other words a time series is the one-dimensional 
projection of a more complex dynamics), it is easy to 
understand that this position is not well founded. 

The consequence of these points is that if not carried out 
in a well defined physical context, signal analysis can be 
expected to be useful only for explorative purposes (e.g. 
feature detection) and/or for filtering tasks (e.g. removal 
of unwanted components as, for example, noise or spurious 
contributions). Furtehrmore, in order to obtain meaning- 
ful results, the nonstationarity of signals must be explicitly 
considered in the analysis. 

2. Data exploration 
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Fig. 1. a) Realization of the non-stationary process de- 
scribed in the text, b) Corresponding normalized power- 
spectrum. 

2.1. Limits of the classical approach 

It is well known that the power-spectrum PS(^) of a con- 
tinuous and integrable (possibly complex) signal x(t) is 
defined as 

PSH = |F1»| 2 , (1) 

where the frequency range is the full real line, i.e. — oo < 
v < +oo, and 

(2) 



FT,,) . / ««, e — « 



is the Fourier transform of x(t). PS(^) can be considered 
as a measure of the similarity between signal x(t) and the 
complex sinusoid exp(— j2itvt). Indeed, if FT(u) is again 
integrable, the Fourier inversion formula reads 



.. (t) = / FT M *~ 



(3) 



i.e., the signal can be written (uniquely) as a kind of super- 
position of pure frequencies. Therefore, the larger PS(^), 
the more important is the contribution of the pure fre- 
quency v within x(t). This is the reason why PS(z^) is so 
useful in the search of periodic components. However, this 
is also it s more serious limit in the ana lysis of non-periodic 
signals ( Feichtinger fc Strohmer 1998 ). Fig.|l|a shows a sig- 
nal constituted by two sinusoids with amplitudes different 
from zero only within a short time interval: 

x(t) = ai(t) sin(27T^it:) + 02 (t) sin(27w 2 £) (4) 

with v\ =0.1 Hz, h>2 — 0.4 Hz and 



ai{t),a 2 (t) 



1 if 10 < t < 50 and 70 < t < 110 
otherwise, 
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Fig. 2. Grayscale image of the ideal power-spectrum 
IPS(i, v) corresponding to the signal shown in Fig.|l|a. 

with t expressed in seconds. The two key facts are: 

- this signal clearly is not periodic and 

- it does not maintain its characteristics constant over 
time (in poor words, it is non-stationary). 

From Fig.|l|b it is evident that PS(^) is not able to ex- 
tract the salient features of x{t). In particular, the power 
spectrum clearly contains no information about the time 
intervals on which these two frequencies occur. Moreover, 
since PS(z/) ^ also at frequencies v ^ Vi,v%, one could 
erroneously infer that x(t) contains more than two sinu- 
soidal components. The main problem is that, if a sig- 
nal changes its characteristic over time, a representation 
limited exclusively to the frequency domain is absolutely 
inadequate. Indeed, the function exp(— j2irvt) does not 
have any temporal localization: it has neither start nor 
end. This is an useful characteristic only in case of sig- 
nals with constant statistical properties (i.e. periodic or 
stationary signals). Unfortunately, in nature this kind of 
signals is not so common. Therefore, for a good analysis of 
most "natural" signals it is necessary to use a representa- 
tion that is able to consider also the temporal dimension. 
A possible solution is a representation, that we call ideal 
power-spectrum, IPS(t, v), defined as the squared ampli- 
tude of the sinusoids that at a given time instant t consti- 
tute the signal of interest. That is shown in Fig.|| where 
it is possible to realize that with IPS(t, v), contrary to 
PS(i'), the characteristics of the signal in Fig.pi are very 
well defined; it is clear that such a signal consists of two 
pure sinusoids having different frequencies but amplitudes 
which evolve in the same way. 

2.2. Time-frequency representation of continuous 
signals 

In the previous section we have seen that in case of 
non-stationary signals a joint time-frequency rcprcscnta- 
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tion (JTFR) might be a good analysis tool. However, 
the JTFR in Fig.|| has been obtained directly from the 
model with which the time series has been simulated. 
Unfortunately, in an experimental context the dynami- 
cal model underlying a given time series will usually be 
unknown. Consequently, the problem consists in the way 
such a JTFR can be estimated from a set of available data. 
The simplest solution is given by the so called Fourier 
Spectrogram, FS(t,v), 



FS(i,i/) = |STFT(t,i/)| 2 



(6) 



where — oo < t < +00, — 00 < v < +00, and 
+00 

STFT(£, v) = J w*{t-r) x{t) e- j2 ™ T dr, (7) 

— oo 

(symbol "*" stands for complex conjugation), is the so 
called short time fourier transform. Here 

w(t) for -T/2 < t < T/2; 
~ otherwise 

is a window function usually indicated with the term of 
analysis function. The rationale is that, for signals x(t) 
which maintain their harmonic components unchanged at 
least approximately within a time interval of length T and 
centered at t , FS(i Q , v) can be considered an estimate of 
IPS^o,^)- Although effective, this methods is character- 
ized by two drawbacks: 

- since FS(t, v) is estimated via the Fourier transform 
of only a segment of the available signal, it suffers a 
degradation of the frequency resolution inversely pro- 
portional to the length of w(t). For example, if w(t) is 
the rectangular window 



w(t) 



1 for -T/2 < t < T/2; 
otherwise, 



(9) 



it is easy to show that 

STFT(i , v) = FTO) <g> sinc T (i , v), (10) 



where the symbol "(H)" stands for convolution, and 

smiTni/ 



sinc T {t a ,v) 



(ii) 



This means that, in FS(t D , v), the narrowest structure 
along the frequency direction is charaterized by a fre- 
quency support equal to 2/T. Such value corresponds 
to the full width at zero intensity of the main lobe in 
the graphical representation of |sincT(^o, v)\ 2 \ 
since FS(t, v) is calculated by using all the points in 
a interval of length T centered at t, it suffers also a 
degradation of the resolution along the time dimension 
which is proportional to the length of w(t). Indeed, in 
case of a signal constituted by an impulse at time to, 
and again with a rectangular w(t), FS(t, v) ^ when 
t Q — T/2 <t< t + T/2. In other words, the narrowest 
structure along the time direction is charaterized by a 
time support equal to T. 



From these two points it is easy to understand that the 
main limitation of FS(t, v) lies in the impossibility to in- 
crease simultaneously the time and the frequency resolu- 
tions. In other words, there is a trade-off: if w(t) is chosen 
to have good time resolution (i.e., smaller T), then its fre- 
quency resolution must be deteriorated and vice versa. In 
principle, a possible way to mitigate this occurrence is the 
choice of a window w(t) for which the degradation of the 
time-frequency resolution is reduce to a minimum. In this 
regard, it has been proved (e.g., Qian fc Chen 1996| ) that, 
if the time- frequency resolution of a representation is mea- 
sured by its energy concentration in the time-frequency 
domain, then the Gaussian function 



-w-(;) : 



(12) 



represents an "optimal" choice. Without any a priori in- 
formation on the spectral characteristics of the signal, a 
is a free parameter with which it is possible to balance be- 
tween an increased time or frequency resolution according 
to the current necessities. In reality, if w(t) is a smooth 
function, well localized around zero, it will serve well as 
a window for the above purpose, independently from the 
precise analytic form. In this regard note that, because 
of its lack of smoothness, the rectangular window, used 
above only for ease of discussion, does not represent a 
good choice because of its very poor frequency resolution. 

2.3. Time-frequency representation of discrete signals 

In practical situations a signal is available only in discrete 
form, say x[k], obtained by sampling x(t) on a regular grid 
of L s time instants k = 0, 1, 2, . . . , L s — 1. Consequently, 
FS(i, v) and STFT(t, v) have to be modified to their dis- 
crete variants: 

DFS[m,n] = |DSTFT[m, n]\ 2 ; (13) 

L.-l 

DSTFT [m,n] = ^ w*[k-m] x[k] e -&™ k / L % (14) 

fc=0 

with to, n, < m, n < L s — 1, representing the discrete 
time and frequency indices, respectively. 
Fig.|| shows the representation DFS[m, n] of a time se- 
ries obtained by sampling the signal in Fig|l] with a time 
step of one second: in spite the loss of resolution described 
above, the salient characteristics of the signal remain well 
defined. 



2.4. Some practicalities 
2.4.1. Digital implementation 

The calculation of DFS[m, n] requires some technical ad- 
justments. The first one regards the edge effects, due to 
the finite length of the signal, that do not permit the 
calculation of this representation for to < L w /2 and 
to > L s — 1 — L w /2, with L w the length of w[k]. Usually 
this problem is solved through one of the following three 
methods: 
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Fig. 3. Contour plot of the DFS[m, n] representation for 
the discrete time series, shown in the uppermost panel, 
corresponding to the signal in Fig.[l]a when sampled at 
a frequency rate of 1.0 Hz (the corresponding power- 
spectrum is shown in the leftmost panel) . In this example 
the analysis function is an Hamming window with length 
L w — 31. Levels are uniformed distributed between the 
15% and 85% of the peak value. 



= DSTFT[mAM, nAN] 



L s -1 



(16) 



J2 w*[k-mAM] x[k] e -&™kAN/L 3 



k=0 



where < m < M = L S /AM, < n < N = 
L S /AN, with AM and AN integer divisors of L s such 
that L S /(AM x AN) > 1. In such a signal representation 
the coefficients a rnn are called the Gabor coefficients and 
the equality (|l6|) goes by the name Gabor representation 
(GR). 

Given the above arguments, in order to avoid "un- 
necessary" redundancy, one could be tempted to use a 
SDFS[m, n] containing only L s elements. In reality, nu- 
meri cal stability of the expansion and easiness of the anal- 
ysis ( Daubechies 1992 , and see below) , make it necessary 
to work with redundant representations. The degree of re- 
dundancy is expressed through the so called oversampling 
rate, say a, 



number of elements of SDFS[m, n] 
length of x[k] 



(17) 



that will take values in the range [l,£ a ]. When a = 1, 
SDFS[m, n] is said to be sampled at a critical rate, whereas 
when a > 1 it is said oversampled. Using a full short-time 
Fourier transform corresponds to the case a = L s . 



1) by assuming x[k] and w[k] to be periodic with period 
L s (NB. since usually L w < L s , w[k] has to be zero 
padded) ; 

2) by zero padding one of the ends of x[k] with L w data 
and assuming the new time series to be periodic with 
period L s + L w (for an efficient implementation of this 
method see |Qian fc Chen 1996j ); 

3) by imposing that < m < Mr, with Mt = L s — L w + 1 
the number of possible shifts of w[k] inside the signal 
length (for an efficient implementation of this method 
see 



Munk 1997, 2001) 



If not stated otherwise, in the following the discussion will 
be developed according to the first approach. 

A more serious drawback is represented by the fact 
that, by increasing the length of x[k], the corresponding 
DFS[m,n] becomes quickly an huge matrix. Indeed, al- 
ready L s = 1000 will result in a matrix with ~ 10 6 el- 
ements. A viable solution is to cut the signal in shorter 
segments and then to analyze them separately. An alterna- 
tive comes out from the observation that DFS[m, n] is an 
highly redundant representation; it is composed by ~ L 2 S 
elements that, however, have been obtained from only L s 
data. Since L s is the dimension of the signal space, at 
most L s elements can be linear independent. This means 
that, in principle, it is possible to recover arbitrary sig- 
nals if samples of DFS[m,n] for at least L s coordinates 
are available. According to this observation, Eqs.(p^),(p^) 
could be modified in 



2.4.2. Which L w for w[k]l 

One important step in the construction of JTFR's, such 
as SDFS[m, n], is the choice of the analysis function w[k}. 
Although the exact shape of this function is not so critical, 
the value of L w can severely influence the final result. In 
fact, in order to be meaningful, a SDFS[m, n] has to reflect 
both the instantaneous frequency content and the (possi- 
ble) non-stationarity of the signal. However, very often 
these two requirements conflicts one with the other. The 
reason is that a large L w is preferable when the frequency 
content is of interest and shows little changes, whereas 
the time evolution of a signal can be better tracked with 
a small L w . Again, we are forced to decide within a trade- 
off. 

Despite its importance and its long-time use in sig- 
nal processing, surprisingly this subject has found little 
attention in current literature (however, see Stankovic & 



Katkovnik 1999; Stankovic 2001, and references therein) 



SDFS[ 



(15) 



A possible explanation is that this problem is not so well 
defined. Indeed, the concept of "optimal" L w is strictly 
linked to the problem at hand. For example, if one is in- 
terested in tracking the "long term" evolution of a sig- 
nal rather than possible short "bursts", then larger L w 's 
represent the "best" choice. The contrary is true if the 
short term evolution is of interest. However, in certain ap- 
plications (e.g. automatic signal detection) it should be 
useful to have an "objective" method able to provide a 
"good" L w without the intervention of an operator. Of 
course, that requires the adoption of an optimality crite- 
rion. At this regards, a possible choice is that L w provides 
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Fig. 4. Grayscale image of the DFS[m, n] representation 
relative to the chirp signal x(t) = sin[27rfi/(i)] with v(t) = 
4.0 x 10~ 3 x t + 0.10 and < t < 100 sec, when sampled 
with a frequency rate of 1.0 Hz and contaminated with 
a discrete, Gaussian white noise process with standard 
deviation equal to 0.3. 

the representation with the "best" global time-frequency 
resolution. In this case the following procedure is useful: 

- a set, say A, of integer values for L w is fixed; 

- the SDFS[m, n]'s corresponding to the L w £ A are 
calculated; 

- for each of these SDFS[m, n] the two-dimensional au- 
tocorrelation function ACF[fc,Z] is computed: 

ACF[jfc,i] = 

M-l N-l 

= YjJ2 SDFS[m, n] SDFS[m + k, n + I] (18) 

rn— n— 

with —M <k< M and -N < I < N. 

- the value of L w to which corresponds the "sharpest" 
ACF[m,n] is selected as the searched length. 

The idea behind such a procedure is that the concept of 
"best" global resolution implies a representation maxi- 
mally concentrated in the time-frequency plane. Indeed, 
a value of L w too large or too small will results in a 
bad resolution along the time or frequency dimension and 
therefore in a representation that tends to spread in the 
time-frequency domain. The use of ACF[fc, I] stems by 
the observation that the larger the spread of the repre- 
sentation, the more similar (correlated) the values are of 
the adjacent elements in SDFS[m, n]. This will reflect in 
ACF[fc, Z]'s with significant values also for lags k, I differ- 
ent from zero. The only open question in this procedure 
is the measure of the "sharpness" of AGF[fc, /]. At this 
regards there are various possibilities. In our numerical 
experiments we have seen that the simple count of the el- 
ements of ACF[fc, I] with values larger than a given thresh- 
old, say 50% of the maximum, is able to provide already 
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Fig. 5. Histogram concerning the lengths L w provided in 
100 simulations by the autocorrelation method described 
in the text. The signal used in this simulation it is shown 
in the uppermost panel of Fig.^ and the target value is 
L w = 31. 

satisfactory results. This method for the estimation of L w 
works very well even in the situation of very noisy data 
(see Figs.[| and |J). The only concern regards the calcu- 
lation of ACF[fc,Z] that, if carried out on the basis of its 
definition (^|), can be computationally very expensive. 
Indeed, for saving computing time, this operation has to 
be carried out in the Fourier domain via the inverse double 
Fourier transform of the bidimensional power-spectrum of 
SDFS[m, n]. Although very fast, with this approach it is 
implicitly assumed that SDFS[m,n] is periodic with re- 
gards to both indices; this fact introduces edge effects. 
Fortunately, they can be avoided via zero padding; if in- 
terested in the correlation for lags as large as ±K along 
both dimensions, then both ends of SDFS[m,n] must be 
zero padded in such a way to obtain a matrix with dimen- 
sions (M + K) x (N + K). 

2.5. How to improve the resolution of the 
time-frequency represen ta tions ? 

The main limitation of SDFS[to, n] is its bad resolution. In 
many practical situations that is not a so serious problem. 
However, in certain applications (e.g. tracking of quasi- 
periodic sources) more precise results can be required. 
This is a subject that has been extensively treated in cur- 
rent literature. Here three methods are shortly presented 
that are able to provide interesting results. We warn, how- 
ever, that this list is by no means complete and other im- 
portant approaches are available in current literature as, 
for instance, the reduced interference distributions ( pohcn 
19951). 
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2.5.1. Wigner-Ville distribution 

The loss of resolution suffered by the FS(t, v) is due to 
the fact that, for each time instant t, STFT(<, v) is calcu- 
lated by using only a segment of the signal. In principle, if 
such windowing were avoided, it could be possible to ob- 
tain a JTFR with a much better resolution. Indeed such 
a JTFR does exist, it is named Wigner-Ville Distribution, 
and comes out from the problem of defining a representa- 
tion, say WVD(t, u), with the property that: 



Signal 



+ 00 



WVD(t,v) dv = \x{t)f 



WVD(t,t/) dt = PS(i/) 



(19) 



(20) 



In other words, the marginals with respect to frequency 
and time have to provide, respectively, the squared signal 
and the corresponding power-spectrum. The form of this 
function, that however is not the only one sharing proper- 
ties (|l9| ) and (|2p|), is given by ( Flandrin 199E ; Grochcnig 
200l[jj 

WVD(t,v) = 



s(f + t/2) x*(t - t/2) e - j2nVT dr. (21) 



In reality, the use of this JTFR has been rather limited 
because of two drawbacks: 

- WVD(£, v) can go negative. In fact, except very par- 
ticular cases, e.g., x(t) given by a Gaussian function, it 
will go negative for all signals. The non-positive prop- 
erty prohibits an energy interpretation of this repre- 
sentation; 

- the quadratic nature of WVD(i, v) determines the oc- 
currence of non-linear effects when dealing with mul- 
ticomponents signals. 

This last point is particularly troublesome. In fact, if 
x(t) = g(t) + r(t), it happens that 

WVD x (t,v) = 

= WVD g (t, v) + WVD r (t, v) +2 • fie{WVD Sir (t, v)} 



Auto— Terms 



Cross— Terms 



(22) 



where 



WVD fl>r (t,i/) 



g(t + t/2) r* (* - t/2) e~^ vt dr. (23) 



In general, the WVD(t, v) corresponding to a signal con- 
taining N components will be characterized by TV auto- 
terms, that constitutes the desired information, and 0.5 x 
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Fig. 6. Grayscale image of the WVD[m, n] representation 
for the time series shown in the uppermost panel of Fig.|[ 

N x (jV — 1) real cross-terms which can be considered a 
sort of unwanted interference disturb. The worst point is 
that the intensity of such cross-terms is larger than those 
of the corresponding auto-terms. As shown in Fig.^, that 
can mean unreadable results. In spite of these problems, 
WVD(t, v) has very interesting properties not shared by 
most of the other representations. For this reason, in these 
last years, one of the most important subject in the field of 
time-frequency analysis has been the research of methods 
able to reduce the importance of the interference terms 
with the minimum deterioration of such properties. 

2.5.2. The reassignment method 



It is possible to show (Flandrin 1999) that the spectro- 



gram FS(i, v) is the 2D-convolution of the Wigner-Ville 
distribution of the signal x(t), say WVD x (t, v) with the 
Wigner-Ville distribution, say WVD w (t, v) of the window 
function w(t): 

FS(t,v) = 

+ 00 +OO 

= J j WVD x (s,Z)WVD w {t-s,v-Z) ds d£. (24) 



—00 —00 



From this equation it is possible to see that the reason why 
FS(t, v\ contrary to WVD(t, u), does not present the in- 
terference terms is due to the fact that such terms are 
smoothed out by the convolution. Such operation, how- 
ever, deteriorates the time-frequency resolution. 

A closer look at Eq.(p4|) shows that W w (t — s, v — £) 
delimits a time-frequency domain at the vicinity of the 
(t, v) point, inside which a weighted average of the WVD X 
values is performed. This is the reason of the bad time- 
frequency resolution of FS(t, v). Indeed, the key point of 
the reassignment principle is that these values have no rea- 
son to be symmetrical distributed around (t, v), which is 
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the geometrical center of this domain; their average should 
not be assigned at this point, but rather at the center of 
gravity of this domain, which is much more representa- 
tive of the local energetic distribution of the signal. In 
practice, with the reassignment approach one moves each 
value of the spectrogram computed at any point (t, v) to 
another point (£, v) which is the center of gravity of the 
signal energy distribution around (t, v) : 



Signal 



-t-oc -t-oc 

J J s WVD w {t- s,v-£) WVD x (s,£) dsd£_ 



i{t, v) 



+00 +00 



J J WVD«,(t-a,i/-f) WVD x (s,£) dsdt, 

(25) 



J J i WVD„(t - a, v - WVD x (s, £) dsdt; 



u(t,u) 



— 00 —00 



+00 +00 



J J WVD, [t - s, v - e) WVD, {s, dsdt; 



(26) 

and thus leads to a reassigned spectrogram, whose value at 
any point (t' , v') is the sum of all the spectrogram values 
reassigned to this point: 

FS r (t',z/) = 

FS(t, 1/) $(*' - t(t, v)) 8{v' - v(t, v)) dt dv 



(27) 

Fig.0 show FS r [fc, i] for the data shown in Fig j^. A detailed 
description of the algorithm and of its digital implemen- 
tation can be found in Chassande-Mottin (1998| ). 
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Fig. 7. Grayscale image of the FS r [m, n] representation 
for the time series shown in the uppermost panel of Fig.|[ 



These observations suggest that, if WVD(t, r) can be 
decomposed as the sum of two-dimensional (time and 
frequency) localized harmonic functions, then we could 
use the low-order harmonic terms to delineate the time- 
dependent spectrum with reduced interference. High-order 
harmonics that introduce high oscillation have relatively 
small averages and thereby have negligible influence on 
the useful properties. The signal energy and useful prop- 
erties are mainly determined by a few low-order harmonic 
terms. On these bases, they propose the so called Gabor 
spectrogram GSd representation 



GS D [k,l] = ^P d [fc,Z] 



(29) 



ri=0 



where 



Pd[k,l] = 2^a m „ a* 



2.5.3. Gabor spectrogram 



Recently, Qian & Chen (1996) have presented an interest- 
ing technique for improving the time-frequency resolution 
of FS(t, v). They start from the consideration that each 
cross-term in WVD(i, v) is located midway the pair of 
corresponding auto-terms and tends to highly oscillate in 
both time and frequency directions. On the other hand, 
useful properties, such as the time marginal, frequency 
marginal and instantaneous frequency, are obtained by av- 
eraging WVD(t, v). For example, the mean instantaneous 
frequency v- ms t , a quantity that is commonly used for char- 
acterizing non-stationary signals, can be evaluated via: 



^inst 



v WVD(t, v) dv 
WVD(t,v) dv 



(28) 



f [k- AM(m + m')/2} 2 [I - AN(n + n')/2} 2 
exp < ^ — > x 



exp { —— [k(n - n')AN- 



2af 



+ { I - ^p-AA^ AM (to' - to) 



, (30) 



where 



A c i = {{m, tii), (n, n ); |m — m \ + \n — n \ = d}, (31) 



0~k 



2^ 



07 



Ti20F' 



(32) 



with < m,m' < M tot - 1, -A/2 < n,n' < A/2 - 1, 
< k < L s , and L w /2 < I < L w /2 - 
parameter whose value can be chosen 



1. Here, T\ is a 
a \J AM ■ N and 
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a mn are the Gabor coefficients calculated via Eq. (|16|) that 
here we rewrite in a slightly modified form 



L =256, o=1, M=16, N=16 



L =256, C6=2, M=32, N.16 



—j2irmnAM/N 
L m -1 



J2 w*[k] x[k + m, AM] e~ j27r 



-j2-nkn/N 



(33) 



fe=0 



< m < M T = [{L s - L W )/AM + 1], < n< N. It is 
possible to show that the larger the value of D the better is 
the resolution of the representation but, at the same time, 
the higher is the contribution of the high-order harmonics 
in the decomposition of WVD(t, v). That means that there 
is a trade-off between time-frequency resolution and the 
importance of the interference terms. Usually D = 3,4 
is sufficient for many practical applications. For the time 
series of FigM GS4 [£;,/] provides results very similar to 
those of FigJ^TA detailed description of the algorithm and 
of its digital implementation can be found in Munk (1997, 
200ll). 



3. Time-variant filtering of discrete signals 

Although data exploration is an interesting application of 
time-frequency analysis, the issue where such an approach 



show s all its potency is signal fil tering ( Matz fc Hlawatsch 
2002| ; |Hlawatsch fc Matz 200^ ). Indeed, given the capa- 



bility of a JTFR in characterizing a non-stationary signal, 
we may expect that a filtering technique, developed within 
this framework, be able to adapt its action to the instanta- 
neous properties of the signal itself. Since it is much easier 
to treat this subject in the context of the linear represen- 
tations (WVD, GSd and FS r are bilinear representations 
carrying no phase information in a form easy to use), we 
shall concentrate in our discussion on the use of the Gabor 
transform. 



3.1. Some preliminary notes 

We have seen that the GR of a discrete signal corresponds 
to map a one-dimensional array onto a two dimensional 
matrix. Of course, in order to benefit of this kind of rep- 
resentation for signal filtering purposes, it is necessary to 
be able to do the reverse job. It can be shown that the 
relationship between a discrete signal x[k] and the corre- 
sponding GR is given by ( prochenig 200l| ) 



x[k] = 



Af-l N—l 

EE 

m— n— 



g[k - mAM] e 



j2-nkn/N 



(34) 



with < k < L s and 



a,. 



^2 w*[k- mAM] x[k] e 



-j2irkn/N 



(35) 



fe=0 



with < m < M, < n < N . Here, the synthesis function 
g[k] and the analysis function w[k] have to satisfy the so 




L =256, a=16, M=64, N=64 
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Fig. 8. Gaussian analysis window w[k) and the corre- 
sponding synthesis windows g[k] for different values of the 
oversampling rate a. For easyness of comparison, functions 
g[k] are normalized to unit area. 

called biorthogonality condition (or Wexler-Raz identity). 



E 



w*[k] g[k + P N] e -^ fe «/ AM = 



AM 



S p S q 



(36) 



with -AN < q < AN, < p < AM. Pairs of functions 
satisfying Eq.(|3^) are termed dual functions or windows. 
Since this formula is symmetric with respect to g[k] resp. 
w [k], duality is a symmetric relation. 

For oversampled GR, with a > 2L W /(L W + AM), 
the system of linear Eq.([56|) is underdetermined. In other 
words, g[k] and w[k] do not determine each other in a 
unique way. Hence, for a given g[k] there is a certain free- 
dom in the choice of the correspond ing dual complcmcn- 
tary w[k]. The solution suggested by Qian fc Chen (1996 ) 
is to solve system (|3^) with the constraint, say T, that the 
shapes of g[k] and w[k] be as close as possible (in least 
squares sense): 



L s -1 



fc=0 



El w [ K i _ 9\M 
, . \ \\w\\ \\g\\ 



(37) 



This choice turns out to correspond to the solution of the 
system ( p^ ) via a pseudo-inverse method. For example, by 
supposing g[k\ fixed and rewriting system (|36| ) in a matrix 
notation, 



Hw* = u, 



(38) 



the standard synthesis function w, i.e., the window satis- 
fying Eq.(B6|) with minimal norm, is given by 



H*(HH*) 



(39) 



It can be shown that the larger a the more similar are 
g[k] and w[k] (see Fig.|J). In particular, when a = L s (i.e., 
AM — AN = 1) then the two functions are identical. 
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Fig. 9. Dependence of the standard deviation (STD) of 
the residuals between the true and the reconstructed sig- 
nals on the value of the noise threshold 7„. The signal used 
in this example is that shown in the uppermost panel of 
Figf 

In case of a critical sampled GR (i.e. a — 1,) w[k] 
and g[k] are uni vocally determined. However, according 
to the Balian-Low theorem, both the functions cannot be 
simultaneously well localized in time and frequency (see, 
for example, Mallat 199S| ). In the practical application, 
the consequence is that w[k] and g[k] have very differ- 
ent shapes. This is the main reasons why in the filtering 
problems it is preferable to work with redundant repre- 
sentations (see below). 

3.2. Time-variant signal denoising 

As shown in Fig.^, the choice of 7 Q is an important step in 
a filtering procedure. Unfortunately, without substantial 
a priori information (typical situation in many applica- 
tions), there are no so many ways to select the "optimal" 
value for this threshold. 

In many practical situations the signal of interest is 
contaminated by a broad-band random noise (white- 
noise). The key point is that such a noise tends to 
spread evenly over the entire joint time- frequency domain, 
whereas the signal component is usually concentrated in 
a relatively small region (e.g., see Fig.||). This fact sug- 
gests a simple procedure, similar to the hard thresholding- 
method introduced by Donoho for wavelets (see, for exam- 
ple, Donoho 1995), to filter out the noise contribution: 



1) 



determination of those indices for which the corre- 
sponding coefficients a mn may be considered as con- 
tributed by the noise. This step can be carried out in 
several ways according to the problem at hand. A typ- 
ical solution consists in setting a threshold value, say 
7„, for the elements |a m „| 2 of SDFS[m, n]. The idea is 
that the coefficients a mn of GR, whose absolute value 
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Fig. 10. a) Histogram of the \a mn \ coefficients for a re- 
alization of a noise process with the same characteristics 
of that contaminating the signal of Fig.^. b) Histogram 
of the | Omni coefficients for the signal of Figji[ The three 
arrows indicate the 95%, 99%, 100% empirical percentiles 
relative to the histogram of the simulated noise process 
and provide three possible choices for j a . 

is smaller than 7 a , can be assumed unaffected by the 
signal contribution; 

2) masking (zeroing) of such coefficients in order to obtain 
a "cleaned" version of the coefficients, denoted by a mn , 
equal to zero in areas only occupied by noise; 

3) synthesis of the filtered signal x[k] via Eq. (|34|) . 

Although this approach is fairly simple, its application re- 
quires some care in order to avoid unpleasant side effects. 
In particular, one point is important: 

once the analysis function g[k] is fixed, it is advis- 
able that the synthesis function w[k] be calculated 
via Eq.(^). In other words, the shapes of g[k] and 
w[k] have to be as close as possible. 

The good reason for this is that g[k] is often chosen to 
have a good localization both in time and frequency. If 
w [k] would not be forced to share this property it might 
be badly localized along one or both of these two dimen- 
sions. The consequence is that the synthesis of x[k], via 
the coefficient a mn , could provide different results from 
what expected; 

Here it is necessary to stress that, in general, the set 
of modified coefficients a m n will not constitute a valid GR 
representation of any signal whatever. Indeed, it happens 
that 

M-l JV-l 

*]^^ a mn g[k~ mAM) e ^ kn ' N . (40) 

m— n— 

In particular the coefficients a mn , obtained from x[k] via 
transform (|35|), can be different from zero even for those 
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indices mo, no, for which a„ 



= 0. The reason is that 



coefficients a mn are obtained via a convolution operation 
and therefore no physical signal can correspond to a rep- 
resentation a mn characterized by the sharp cutoffs created 
by the masking procedure. In any case, it can be shown 
(see appendix |A|) that, if w[k] and g [k] satisfy the con- 
straint (|37|), x[k] is a signal whose Gabor coefficients a mn 
are as close as possible (in least squares sense) to a mn . 

A last note regards the fact that, in certain situations, 
the zeroing of coefficients a mn can constitute a too drastic 
operation. In fact, it can introduce some spurious struc- 
tures in the reconstructed signals. In this case, a possible 
alternative is represented by the weighting of coefficients 
a mn , according to the expected level of noise contamina- 
tion, in a way similar to that suggested by Donoho (1995 ) 
(soft thresholding) in the context of wavelets. 

3.2.1. Practical determination of an "optimal" 7 a 

Apart the very common assumption of Gaussianity, the 
typical information available about the noise is an esti- 
mate of its standard deviation. Conversely to the case of 
classical Fourier analysis, the statistical characterization 
of a pure random Gaussian process in the time-frequency 
domain is much more complex. The problem is that coef- 
ficients a mn are not independent. This situation suggests 
that the simplest (and safest) way to determine the thresh- 
old 7 a is the following: 

- simulation of a noise process (not necessarily 
Gaussian) via a pseudo-random generator; 

- calculation of the corresponding SDFS[m,ro] with the 
same modalities used for the calculation of the GR of 
the signal; 

- estimation of a simple statistical index (e.g. empiri- 
cal percentiles), describing the characteristics of such 
a representation, that provides the desired threshold 
for the SDFS[m,rt] of the signal. 

Although very simple, in our numerical simulations this 
method has proved to be rather solid (see Fig.|l(] to com- 
pare with Fig.||). An alternative to the last step could 
be the calculation of a set of filtered time series x[k] cor- 
responding to different choices of j a . After that, among 
these solutions it is chosen the time series that satisfies 
some statistical requirements as, for example, that the 
standard deviation of the residuals between the original 
and the filtered signal is equal to the nominal value of 
noise level. Although appealing, from our numerical ex- 
periments this approach appears to be not so stable. 

In the case one lacks also the information regarding the 
standard deviation of the noise, the simplest solution con- 
sists in considering directly the SDFS [m, n] of the signal 
and then in determining the threshold interactively by de- 
limiting, via some graphical facility, the region where the 
signal is supposed to give its contribution. Another possi- 
bility is the use of some statistical indices of the values of 
SDFS[ra,n] (e.g. percentiles). 
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Fig. 11. a) Number of iteration for the Qian & Chen 
method providing the best reconstruction of the signal of 
Fig.|| as function of the noise threshold j a . b) Standard 
deviation (STD) of the corresponding residuals between 
the true and the reconstructed signals. 

3.2.2. Could the SNR be further improved? 

The synthesis of x[k], via the coefficients a mn , permits to 
filter out the noise contribution corresponding to the coef- 
ficients a mn that have been zeroed. However, what about 
the contamination of the remaining coefficients? In their 



book, Qian & Chen (1996) suggest a method that, ac- 
cording to their opinion, should further improve the SNR 
of the final results. Their idea is very simple and consists 
in an iterative application of the procedure described in 
the previous section. In other words, once obtained the 
filtered x[k], to this signal is applied the same procedure 
used for x[k] and so on. 

The most serious problem of this method consists in 
the lack of theoretical argumentation that guarantee its 
reliability. Indeed, the repeated application of the mask- 
ing procedure corresponds to a contraction, namely to an 
operation for which at the n-th iteration 



\x n [k}\\ 2 < ||x„_i[fc]|| 



(41) 



In other words, the total energy of the filtered signal is 
continuously reduced with the number of iterations. At 
first sight, one could accept this fact as useful since ef- 
fectively the noise tends to increase the power of an ob- 
served time series. Unfortunately, such an increase regards 
the global characteristics of a signal whereas the procedure 
suggested by Qian & Chen is applied to the coefficient a mn 
that reflect its local properties. In other words, given an 
observed Gabor coefficient, the only thing one is allowed 
to claim is that, because of the noise contamination, it 
will be different from the true one. For a correct filter- 
ing, however, we should be able to decide at least whether 
the contaminated coefficient is larger or smaller than the 
uncontaminated one. 
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Observed Signal 



Fig. 12. Contour plot of the DFS[m, n] representation for 
the signal mixture, given by two components disjoint in 
the time- frequency domain, described in the text. Levels 
are uniformly distributed between the 10% and 90% of the 
peak value. The dotted line delimits the two regions of the 
time-frequency domain that have been used to separate 
the components shown in Fig.O. 



The only situation where the Qian & Chen procedure 
can be expected to work is when 7 a is chosen too low. In 
situations like this one, a large fraction of the unmasked 
coefficient a mn could be erroneously assumed linked to the 
signal. In general, these coefficients are smaller, in abso- 
lute value, than the coefficients effectively reflecting the 
signal contribution. Consequently, the contraction con- 
nected with the Qian's & Chen's technique can be able 
to smooth out them before the signal component be too 
negatively affected. However, we stress that, contrary to 
what claimed by the two authors, the improvement is due 
to the removal of Gabor coefficients still related to the 
noise rather that a better estimate of the coefficients rep- 
resenting the contribution of the signal. Better results are 
to be expected with a more appropriate choice of j a . All 
that is shown in the Fig.[n] to compare with Fig.||. 

3.3. Signal separation 

In certain practical situations it can happen that a given 
signal is the mixture of two or more components. Often 
it can be of interest to separate the various contributions. 
In this respect, it can be shown (Hopgood 2000, and 
appendix [^) that, within the context of linear filtering, 
this operation is possible (to a given degree of accuracy) 
only when a representation of signal is available where 
the various components are disjoint. For example, in the 
Fourier domain, two components are "perfectly" separable 
if their power-Spectra do not overlap. Therefore, in the 
context of the joint time-frequency representations, it can 
be expected that separability of signal mixtures is possible 




40 50 60 
Time (s) 



100 



Fig. 13. Observed signal, first component, and second 
component corresponding to the DFS[to, n] representation 
of Fig.^Jj. Continuous line indicates the original signals 
whereas crosses indicate the corresponding reconstruc- 
tions obtained via the method described in the text. 

if the components do not overlap in the time-frequency 
domain (see Figs.|l2| and |l^). A simple and yet effective 
procedure is the same as that presented in section |3.2| : 
each component is reconstructed by zeroing the Gabor 
coefficients corresponding to the remaining ones. 

In order to understand the potential but also the limits 
of such an approach, it is necessary to understand what 
does it mean the fact that two signal are disjoint in the 
time-frequency domain. To discuss this problem, let sup- 
pose to have a signal x(t) given by the sum of two compo- 
nents, say x\(t) and X2 (t). If SDFS[m, n] is interpreted as 
an energy distribution, no overlap in the time-frequency 
plane implies that 



\x(t)\ 2 dt= / |^i(t)| 2 dt+ / \x 2 (t)\ 2 dt 



(42) 



In other words, the energy of the observed signal is equal 
to the sum of the energies corresponding to the single com- 
ponents. This point is very important since indicates that 
X\ (t) and 22 (t) can be separated only if they do not in- 
terference and the total energy is conserved. From the 
statistical point of view this condition means that the two 
components are uncorrelated. In fact, if x\(t) and Xi (t) 
are supposed, without loss of generality, zero mean pro- 
cesses, then the integrals in Eq.(|42|) provide the variance 
of the signals. However, it is well known that 



>x(t) 



'si(t) 



'Mt) 



2a 



Xi(t)x2(t) 



(43) 



where the last term provides the covariance between x\{t) 



and X2(t). Therefore, Eq. (|42[) implies that a 



?W(t)x 2 (t) 



= 0. 



From the above discussion it is evident that, in the 
separation of signals overlapping in the time- frequency do- 
main, things have to be expected much more complex. For 
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Ideal Power Spectrum 




Fig. 14. Grayscale image of the DFS[m, n] representation 
for three chirps described in the text and the correspond- 
ing ideal power-spectrum IPS(i, v). 



example, FigsJlJ shows the SDFS[m, n] corresponding to 
three signals, sampled with a frequency of one Hertz, that 
have been obtained through the following models: 



x(t) = sin[27rf^ a (t)] + sin[27rf^b(i) + 4>i 



(44) 



with % = 1,2,3, v a (t) = 4.0 x 10~ 3 x t + 0.10, v b (t) = 
-3.5 x 10" 3 x t + 0.45, < t < 100 sec, and fa = 0, 
<f>2 — 7r/2, and 03 = 7r rad. In practice, the only differ- 
ence lies in the phase cj> of the second component. The key 
point to note is that, although the SDFS[m, n]'s are differ- 
ent, these signals are characterized by the same IPS(t, v) 
(see FigJTJ). In other words, in this specific case the equal- 
ity ( f42| ) does not hold. Of course, that is the consequence 
of the interference of the components present in x(t). 
Unfortunately, the details of such a interference will de- 
pend not only on the amplitudes of the components at a 
given time instant, but also on the corresponding phases. 
Therefore, their separation should require that each co- 
efficient a mn be able to provide information about four 
parameters. At least of some a priori information (e.g. the 
value of the phases), this is clearly an underdetermined 
problem. The situation becomes worst in case the number 
of overlapping signals is larger than two. 

One time more, these facts suggest that in order to 
obtain some insights on the characteristics of a complex 
system, in general, the analysis of a signal, if not developed 
within a given physical context, is absolutely insufficient. 



4. JTFR better than wavelets? 

Joint time-frequency analysis does not constitute the 
only possible tool for the analysis of non-stationary sig- 
nals. Nowadays the so-called Wavelet Analysis arises as 
a new alternative to JTFA. In the continuous case the 
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Fig. 15. Scalogram for the time series shown in the up- 
permost panel of FigJ|. 

(Scalogram) of a signal x(t) is given by: 

SCL(i,a) = |WT(t,a)| 2 (45) 
with W(t, a) being the continuous wavelet transform given 

by 



WT(t, a) = 



1 



x(s) ( - — - ) da. (46) 



Here ^(t) is a function, named wavelet, with the property 
that f *f?(i) — 0. The idea behind such an approach is 
that, if \T/(i) is centered in the neighborhood of t — and 
has a Fourier transform centered at a frequency vq ^ 0, 
then in the time-frequency plane \& [(t — b)/a] will be cen- 
tered at t = b and v = vo/a, respectively. Consequently, 
SCL(&, a) will reflect the signal behavior in the vicinity of 
such time instant and frequency. The main difference with 
FS(t, v) is that the variable a corresponds to a scale fac- 
tor, in the sense that taking \a\ > 1 dilates \& and taking 
\a\ < 1 compress it. Therefore, in contrast to the analysis 
function w(t) in JTFR which maintains a fixed duration 
(or width) but changes its shape due to frequency modu- 
lation, in wavelet analysis when the scale factor a is varied 
ty(t,a) maintains its shape but changes its duration and 
consequently its bandwidth. The practical consequence is 
that, while FS(i, v) is characterized by a time-frequency 
resolution constant over the entire time-frequency domain, 
SCL(t, a) presents a better time resolution at high fre- 
quencies but a lower resolution at low frequencies (see 
Fig.[l5| to compare with Fig.||). At this point, the natural 
question arises concerning which choice, between the time- 
frequency or wavelet approach, is the most appropriate 
one in the context of the analysis of astrophysical signals. 
Of course, the answer strictly depends on the problem at 
hand. In particular, it is useful to have a representation 
characterized by a non-constant time- frequency resolution 
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Fig. 16. Grayscale image of the DFS representation of a 
3 seconds segment extracted from the X-ray light curve of 
SCO-X1 described in the text. 



when the signal of interest presents a non-stationary evolu- 
tion on very different time scales as, for example, in case of 
non-stationary self similar (fractal) signals. On the other 
hand, especially in problems of exploratory data analysis, 
when the time behavior of a signal is dominated by one 
or more non-stationary processes which evolve on similar 
time scales (e.g. a periodic process that changes slowly its 
period), then it is by far better to use a representation 
with a constant time-frequency resolution since it permits 
an easier tracking of the changing characteristics of the 
signal. In the astrophysical applications, very often this is 
the situation of interest. 

It is necessary to mention that, especially in the field 
of discrete wavelets, the above indicated limitations have 
been overcome in the sense that some approaches have 
been devised that permit a large flexibility in the choice of 
the resolution across the time- frequency plan (e.g., wavelet 
packets, local cosine bases, cf. Mallat 199S| ). In spite of 
that, in the exploratory analysis of the typical astrophys- 
ical signals the joint-time frequency approach maintains 
its supremacy since it is more intuitive and with better 
developed theoretical backgrounds as far as the interpre- 
tation is concerned. Indeed, while JTFA essentially makes 
use of localized sinusoids, the shape of the wavelets is much 
more difficult to explain from the physical point of view. 
Consequently, the results provided by the time-frequency 
approach are more amenable for a direct physical inter- 
pretation. 



5. Practical applications 

To illustrate the analysis strengths of JFTA in astronom- 
ical applications, we have applied some simple JFTA al- 
gorithms to a sequence of X-ray observations of low mass 
X-ray binary Sco X-l (van der Klis, priv. comm.) made 
with the Proportional Counter Array (PCA) on the Rossi- 





Fig. 17. Ridges of the grayscale image |T^. 



XTE spacecraft ( Bradt et al. 1993 ). The time series used, 
shown in Fig.[l6| (lower panel) , is a 3 seconds subset of the 
available sequence that is 1500 seconds long. Sampling is 
regular with a time step of 125 microsecond. As Sco X-l 
shows quasi-periodic oscillations (QPO) in the X-ray light 
emission, its light curve is characterized a periodic com- 
ponent that changes its period over time. Therefore, an 
important experimental activity consists in tracking such 
changes. Figs.[ll],[l7] show that even a very simple JTFR 
as DFS is able to provide very interesting results, whereas 
Figs.|l8|, |p] show how it is possible to extract the compo- 
nent of interest from the observed signal. Although still 
preliminary and partial, this example represents a good 
illustration of the possibilities of the application of the 
technique to astrophysical (and other) time series. 

6. Summary and conclusions 

In this paper we have considered the joint time-frequency 
domain as a possible framework for the analysis of the 
astrophysical signals. The methodologies based on such 
a domain appear very interesting since they allow reli- 
able analyses and filterings of non-stationary time series 
that are very difficult, if not impossible, to carry out in 
the standard Fourier domain. These methodologies appear 
also useful for the separation of signal mixtures. 

Acknowledgements. We thank Prof. M. van der Klis for the 
availability of the unpublished data on Sco X-l and Prof. H. 
Feicthinger for his useful comments. 



Appendix A: Masking of the Gabor Coefficients 
as Least Squares Filtering 

Transforms ([?4|) and (|35| ) can be rewritten as matrix mul- 
tiplications. In particular, the Gabor transform (|35| ) de- 
scribed as a linear mapping from the signal space to the 
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Original DFS 




Time (s) 



Fig. 18. Original vs. filtered DFS representation of a 1.5 
seconds segment extracted from the X-ray light curve of 
SCO-X1 described in the text. Filtering has been carried 
out via the method described in section B.2 with j a cor- 
responding to the 95% percentile of the DFS coefficients. 




Fig. 19. QPO component vs. original time series obtained 
via the filtering shown in Fig.fl8| 



coefficient space C MN takes the form 



g: 



where a = vec{a mn } and G g is an (MN) x L s matrix 

_ * 
w 

Tam w* 

T2AM w * 

Tmam w* 
Majv w* 
MaatTam w* 

MAi\rT 2 AM w* 

MajvTmaji/ w* 

M 2 atv 9 
M 2 awTam w* 



MnanT(m-i)am w * 
MjvaatTmam w * 



G w = { 



(A.2) 



where M q and T 



are, respectively, the modulation and 
the translation operators 



(M q w)[k] = w[k] 
(Tp w)[k] = w[k — p] 



and 



(MqTp w)[k] = w[k — p] e 



i2irqk/ L s 



(A.3) 



(A.l) 



Since G w is a full rank matrix, from Eq.(A.l) it easy to 
show that the transform (|3l|) can be expressed in the form 

x = G^a, (A.5) 

where 

Gl = (G^G w )- x G; (A.6) 

is the pseudo-inverse matrix of G w with dimensions L s x 
(MN). 



Here the key point is that, if in Eq.( A.l ) we suppose a 
and G w fixed and x unknown, then we are dealing with a 
system of M x N equations in L s unknowns. For oversam- 
pled GR the number M x N is larger than L s and therefore 
the system of equations (A.l) is overdetermined. In this 
case, it can be shown that Eq.( A.5) delivers the corre- 
sponding least squares solution ( Bjorck 1990 ). The same 
still holds also when a is not in the range of G w . That 
happens exactly, for example, when the set of a mn does 
not constitutes a valid GR for x[k]. In this case Eq.(A.l) 
has to be rewritten as 

_ LS 



G v 



(A.7) 



in order to highlight that the equality has to be intended 
in least squares sense. However, x[k] will be characterized 
by a GR a mn = G w x. Consequently, Eq.(A.7) can be 
rewritten as 



- ls ^ 
a = a. 



(A.l) 



(A.8) 

The meaning of this last expression is that, al thou gh a mn 
is not a valid GR, £[k] synthesized via Eq.(Aj)) corre- 
sponds to a signal whose GR is as close as possible to a mn 
in least squares sense. 
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Appendix B: Signal Separability 

In this section we briefly present the conditions according 
to which, given an observed signal x(t) = d(t) + n(t), the 
component of interest d(t) and the unwanted component 
n(t) can be separated. 

A signal x(t) may be represented on an arbitrary do- 
main A by the following general integral transform: 



X(X) 



x(t) 



x{t) K(t, A) dt 



X{\) k(X,t) dX 



V A e A 



V t e T 



(B.l) 



(B.2) 



where A is the region in the signal space that the represen- 
tation X{X) lies in []. Here the functions K(t, A), called di- 
rect transform kernel, and k(X, t), called the inverse trans- 
form kernel, are linked by the relationships: 



k(X,t) K(t,X) dX = 5(t-r) (B.3) 



k(X, t) K(t, X) dt = 5(X - A) (B.4) 



For example in case of the Fourier transform A is equal 
to the Fourier frequency v, K(t, v) = exp(— i2irvt) and 
k(y,t) = K*{t,v). 

If signal d(t) has to be reconstructed by linear filtering 
of the observed signal x(t) (i.e., by making use of linear 
opera tors), then it has to hold (see, for example, Hopgood 
2000| ) 



d(t) 



h(t, t) x(t) dr, 



(B.5) 



where h(t, r) is the response at time t given an impulse 
occurred at the filter input at time r. A sufficient condition 
for the existence of h(t, r) is that it can be written in the 
form 



h{t,T) 



k(X,t) K(t,X) dX 



H (X,X) k{X,t) K(t,X) dX dX. 



(B.6) 



Here A^ is the region of the A space over which the spectral 
component of d(t) does not overlap the spectral compo- 
nent of n(t), whereas Ao is the spectral region where these 
two components overlap. Hq(X, A) is the solution of 



P dx {X,X)= H a (A, A) P xx ( A, A) dX, 



V A, A e A 
(B.7) 



where 



Y(X)Z* (X) if y(t), z(t) deterministic; 
E[Y(X)Z*(X)] \iy[t), z(t) stochastic; 

(B.l 



1 In case of stochstic signals, integrals are to be interpreted 
in a mean-square limit. 



is the generalized cross power- spectrum concerning to the 
processes y(t) and z(t). It can be shown (Hopgood 2000) 
that the reconstruction d(t), as provided by Eqs.(B.5) and 



(B.6), is characterized by a variance a 2 (t) = E[\d(t) 
d(t)\ 2 } equal to 



a 2 (t) = P(T(t (A, A) k(X,t) k*(X,t) dX dX (B.9) 

J A J A Q 

with P aa the generalized power-spectrum of a 1 (t) . 
Here the important points are three: 



1) the first term in the rhs of Eq.(B.6) is independent of 
any signal. Moreover, it constitutes a so called ideal 
filter for A^, namely a filter which passes without dis- 
torsion all generalized frequencies components falling 
in Ad and rejects all others; 

2) the second term in the rhs of Eq. flB.6j ) depends, via 
Eq.( |B.7j ), on the crosscorrelation function between x(t) 
and d(t) and therefore, since x(t) = d(t) + n(t), on the 
crosscorrelation between d(t) and n(t); 

3) the magnitude of o~ 2 (t), given by Eq.( |B.9| ), depends on 
the extension of the overlap between d(t) and n(t). 

This last point is particularly interesting since it indicates 
that signals d(t) and n(t) can be perfectly separated only 
when they have disjoint supports in A. Indeed, in such a 
case, Ao is an empty set and the three points above imply, 
respectively, that: 



Eq.(|B.q) simplifies in 



h(t, t)= k(X, t) K(t, A) dX; 



- d(t) and n(t) are uncorrelated signals; 

- a 2 (t) = 0. 

In other words, the perfect separation of two signals, via 
a linear filtering, requires the following condition be sat- 
isfied: 

there exists some domain where the generalized 
spectral representation of the desired components 
are disjoint. Thus, the desired components must be 
uncorrelated processes. 

All that holds also in the context of joint time-frequency 
analysis. The only different thing is that the function 
h{t,r), with arguments in the time domain, has to be 
changed, via a Fourier transform with respect to r, in 
H (t, v) which has arguments in the mixed time-frequency 
domain. 
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